Overview
Brought to you by YData
Dataset statistics
| Number of variables | 8 |
|---|---|
| Number of observations | 992 |
| Missing cells | 60 |
| Missing cells (%) | 0.8% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 149.2 KiB |
| Average record size in memory | 154.0 B |
Variable types
| Numeric | 6 |
|---|---|
| Categorical | 1 |
| Text | 1 |
Height is highly overall correlated with Sex and 1 other fields | High correlation |
Sex is highly overall correlated with Height | High correlation |
Weight is highly overall correlated with Height | High correlation |
Humidity has 30 (3.0%) missing values | Missing |
Temperature has 30 (3.0%) missing values | Missing |
ID_test has unique values | Unique |
Reproduction
| Analysis started | 2025-04-13 15:05:33.142609 |
|---|---|
| Analysis finished | 2025-04-13 15:06:52.142096 |
| Duration | 1 minute and 19 seconds |
| Software version | ydata-profiling vv4.16.1 |
| Download configuration | config.json |
Variables
Age
Real number (ℝ)
| Distinct | 348 |
|---|---|
| Distinct (%) | 35.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 28.979133 |
| Minimum | 10.8 |
|---|---|
| Maximum | 63 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 7.9 KiB |
Quantile statistics
| Minimum | 10.8 |
|---|---|
| 5-th percentile | 15.7 |
| Q1 | 21.1 |
| median | 27.1 |
| Q3 | 36.325 |
| 95-th percentile | 47 |
| Maximum | 63 |
| Range | 52.2 |
| Interquartile range (IQR) | 15.225 |
Descriptive statistics
| Standard deviation | 10.076653 |
|---|---|
| Coefficient of variation (CV) | 0.34772098 |
| Kurtosis | -0.24545327 |
| Mean | 28.979133 |
| Median Absolute Deviation (MAD) | 7.65 |
| Skewness | 0.59914948 |
| Sum | 28747.3 |
| Variance | 101.53893 |
| Monotonicity | Increasing |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 16.1 | 11 | 1.1% |
| 26.9 | 10 | 1.0% |
| 25.9 | 9 | 0.9% |
| 31.9 | 8 | 0.8% |
| 39.1 | 8 | 0.8% |
| 16.3 | 8 | 0.8% |
| 22.7 | 8 | 0.8% |
| 22.4 | 8 | 0.8% |
| 16.2 | 7 | 0.7% |
| 26.4 | 7 | 0.7% |
| Other values (338) | 908 |
| Value | Count | Frequency (%) |
| 10.8 | 1 | 0.1% |
| 11.8 | 1 | 0.1% |
| 12.2 | 1 | 0.1% |
| 13.2 | 1 | 0.1% |
| 13.7 | 1 | 0.1% |
| 13.8 | 1 | 0.1% |
| 14 | 1 | 0.1% |
| 14.1 | 4 | |
| 14.2 | 5 | |
| 14.3 | 5 |
| Value | Count | Frequency (%) |
| 63 | 1 | |
| 61.6 | 1 | |
| 61.3 | 1 | |
| 59.7 | 1 | |
| 59.1 | 1 | |
| 58.7 | 1 | |
| 58.5 | 1 | |
| 57.6 | 1 | |
| 55.5 | 1 | |
| 55.4 | 2 |
Weight
Real number (ℝ)
High correlation 
| Distinct | 264 |
|---|---|
| Distinct (%) | 26.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 73.383367 |
| Minimum | 41 |
|---|---|
| Maximum | 135 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 7.9 KiB |
Quantile statistics
| Minimum | 41 |
|---|---|
| 5-th percentile | 54.165 |
| Q1 | 66 |
| median | 73 |
| Q3 | 80.225 |
| 95-th percentile | 93.135 |
| Maximum | 135 |
| Range | 94 |
| Interquartile range (IQR) | 14.225 |
Descriptive statistics
| Standard deviation | 12.005361 |
|---|---|
| Coefficient of variation (CV) | 0.16359785 |
| Kurtosis | 1.4049884 |
| Mean | 73.383367 |
| Median Absolute Deviation (MAD) | 7 |
| Skewness | 0.49731123 |
| Sum | 72796.3 |
| Variance | 144.12869 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 73 | 41 | 4.1% |
| 74 | 30 | 3.0% |
| 67 | 29 | 2.9% |
| 72 | 27 | 2.7% |
| 76 | 27 | 2.7% |
| 71 | 26 | 2.6% |
| 70 | 26 | 2.6% |
| 75 | 25 | 2.5% |
| 66 | 24 | 2.4% |
| 78 | 24 | 2.4% |
| Other values (254) | 713 |
| Value | Count | Frequency (%) |
| 41 | 1 | 0.1% |
| 41.9 | 1 | 0.1% |
| 42 | 1 | 0.1% |
| 46 | 3 | |
| 46.6 | 1 | 0.1% |
| 47.2 | 1 | 0.1% |
| 48 | 1 | 0.1% |
| 48.6 | 1 | 0.1% |
| 48.8 | 1 | 0.1% |
| 49 | 2 |
| Value | Count | Frequency (%) |
| 135 | 1 | |
| 127 | 1 | |
| 122 | 1 | |
| 116 | 1 | |
| 113 | 1 | |
| 112.2 | 1 | |
| 110 | 2 | |
| 109 | 1 | |
| 108.3 | 1 | |
| 108 | 2 |
Height
Real number (ℝ)
High correlation 
| Distinct | 182 |
|---|---|
| Distinct (%) | 18.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 174.91351 |
| Minimum | 150 |
|---|---|
| Maximum | 203 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 7.9 KiB |
Quantile statistics
| Minimum | 150 |
|---|---|
| 5-th percentile | 161.775 |
| Q1 | 170 |
| median | 175 |
| Q3 | 180 |
| 95-th percentile | 187 |
| Maximum | 203 |
| Range | 53 |
| Interquartile range (IQR) | 10 |
Descriptive statistics
| Standard deviation | 7.9500267 |
|---|---|
| Coefficient of variation (CV) | 0.045451188 |
| Kurtosis | 0.3030682 |
| Mean | 174.91351 |
| Median Absolute Deviation (MAD) | 5 |
| Skewness | -0.04549302 |
| Sum | 173514.2 |
| Variance | 63.202925 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 178 | 47 | 4.7% |
| 174 | 44 | 4.4% |
| 176 | 42 | 4.2% |
| 172 | 40 | 4.0% |
| 173 | 39 | 3.9% |
| 179 | 38 | 3.8% |
| 171 | 37 | 3.7% |
| 170 | 35 | 3.5% |
| 175 | 34 | 3.4% |
| 180 | 34 | 3.4% |
| Other values (172) | 602 |
| Value | Count | Frequency (%) |
| 150 | 1 | 0.1% |
| 151 | 1 | 0.1% |
| 152 | 2 | 0.2% |
| 153 | 1 | 0.1% |
| 154 | 3 | |
| 155 | 3 | |
| 156 | 4 | |
| 157 | 5 | |
| 158 | 5 | |
| 159 | 4 |
| Value | Count | Frequency (%) |
| 203 | 1 | 0.1% |
| 201 | 1 | 0.1% |
| 199 | 1 | 0.1% |
| 197.5 | 1 | 0.1% |
| 197.4 | 1 | 0.1% |
| 197 | 4 | |
| 195 | 1 | 0.1% |
| 194.2 | 1 | 0.1% |
| 193 | 5 | |
| 192 | 2 | 0.2% |
Humidity
Real number (ℝ)
Missing 
| Distinct | 52 |
|---|---|
| Distinct (%) | 5.4% |
| Missing | 30 |
| Missing (%) | 3.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 48.211435 |
| Minimum | 23.7 |
|---|---|
| Maximum | 69 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 7.9 KiB |
Quantile statistics
| Minimum | 23.7 |
|---|---|
| 5-th percentile | 35 |
| Q1 | 42 |
| median | 47 |
| Q3 | 54 |
| 95-th percentile | 64 |
| Maximum | 69 |
| Range | 45.3 |
| Interquartile range (IQR) | 12 |
Descriptive statistics
| Standard deviation | 8.560991 |
|---|---|
| Coefficient of variation (CV) | 0.17757179 |
| Kurtosis | -0.58075794 |
| Mean | 48.211435 |
| Median Absolute Deviation (MAD) | 6 |
| Skewness | 0.32211548 |
| Sum | 46379.4 |
| Variance | 73.290566 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 47 | 61 | 6.1% |
| 44 | 49 | 4.9% |
| 43 | 47 | 4.7% |
| 42 | 47 | 4.7% |
| 45 | 44 | 4.4% |
| 52 | 43 | 4.3% |
| 39 | 42 | 4.2% |
| 41 | 39 | 3.9% |
| 40 | 34 | 3.4% |
| 49 | 34 | 3.4% |
| Other values (42) | 522 |
| Value | Count | Frequency (%) |
| 23.7 | 1 | 0.1% |
| 28 | 1 | 0.1% |
| 31 | 3 | 0.3% |
| 32 | 6 | 0.6% |
| 32.2 | 2 | 0.2% |
| 33 | 6 | 0.6% |
| 34 | 12 | |
| 35 | 22 | |
| 36 | 10 | |
| 37 | 13 |
| Value | Count | Frequency (%) |
| 69 | 2 | 0.2% |
| 68 | 3 | 0.3% |
| 67 | 14 | |
| 66 | 14 | |
| 65.9 | 1 | 0.1% |
| 65 | 9 | |
| 64 | 9 | |
| 63 | 8 | |
| 62 | 14 | |
| 61 | 16 |
Temperature
Real number (ℝ)
Missing 
| Distinct | 122 |
|---|---|
| Distinct (%) | 12.7% |
| Missing | 30 |
| Missing (%) | 3.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 22.818565 |
| Minimum | 15 |
|---|---|
| Maximum | 32.3 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 7.9 KiB |
Quantile statistics
| Minimum | 15 |
|---|---|
| 5-th percentile | 18.3 |
| Q1 | 20.8 |
| median | 22.9 |
| Q3 | 24.4 |
| 95-th percentile | 27.395 |
| Maximum | 32.3 |
| Range | 17.3 |
| Interquartile range (IQR) | 3.6 |
Descriptive statistics
| Standard deviation | 2.7840663 |
|---|---|
| Coefficient of variation (CV) | 0.12200882 |
| Kurtosis | 0.47360183 |
| Mean | 22.818565 |
| Median Absolute Deviation (MAD) | 1.8 |
| Skewness | 0.33509091 |
| Sum | 21951.46 |
| Variance | 7.751025 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 23.8 | 34 | 3.4% |
| 22 | 26 | 2.6% |
| 21 | 22 | 2.2% |
| 23.5 | 22 | 2.2% |
| 20 | 21 | 2.1% |
| 22.3 | 20 | 2.0% |
| 23.3 | 19 | 1.9% |
| 25 | 19 | 1.9% |
| 24.3 | 19 | 1.9% |
| 20.3 | 19 | 1.9% |
| Other values (112) | 741 | |
| (Missing) | 30 | 3.0% |
| Value | Count | Frequency (%) |
| 15 | 3 | |
| 16.5 | 2 | |
| 16.7 | 1 | 0.1% |
| 16.8 | 2 | |
| 16.9 | 2 | |
| 17 | 3 | |
| 17.1 | 3 | |
| 17.2 | 1 | 0.1% |
| 17.3 | 3 | |
| 17.4 | 1 | 0.1% |
| Value | Count | Frequency (%) |
| 32.3 | 1 | 0.1% |
| 32 | 3 | |
| 30.6 | 5 | |
| 30.5 | 1 | 0.1% |
| 30.1 | 3 | |
| 29.7 | 1 | 0.1% |
| 29.5 | 5 | |
| 29.4 | 7 | |
| 29.3 | 6 | |
| 29.1 | 4 |
Sex
Categorical
High correlation 
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 48.6 KiB |
| 0 | |
|---|---|
| 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1 |
|---|---|
| 2nd row | 1 |
| 3rd row | 0 |
| 4th row | 1 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 843 | |
| 1 | 149 | 15.0% |
Length
Histogram of lengths of the category
Common Values (Plot)
| Value | Count | Frequency (%) |
| 0 | 843 | |
| 1 | 149 | 15.0% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 843 | |
| 1 | 149 | 15.0% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 992 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 0 | 843 | |
| 1 | 149 | 15.0% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 992 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 0 | 843 | |
| 1 | 149 | 15.0% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 992 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 0 | 843 | |
| 1 | 149 | 15.0% |
ID
Real number (ℝ)
| Distinct | 857 |
|---|---|
| Distinct (%) | 86.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 424.89012 |
| Minimum | 1 |
|---|---|
| Maximum | 857 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 7.9 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 45 |
| Q1 | 214.75 |
| median | 428.5 |
| Q3 | 626.25 |
| 95-th percentile | 813.45 |
| Maximum | 857 |
| Range | 856 |
| Interquartile range (IQR) | 411.5 |
Descriptive statistics
| Standard deviation | 243.83248 |
|---|---|
| Coefficient of variation (CV) | 0.57387185 |
| Kurtosis | -1.1422165 |
| Mean | 424.89012 |
| Median Absolute Deviation (MAD) | 206 |
| Skewness | 0.0083793149 |
| Sum | 421491 |
| Variance | 59454.278 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 506 | 5 | 0.5% |
| 492 | 4 | 0.4% |
| 511 | 3 | 0.3% |
| 552 | 3 | 0.3% |
| 553 | 3 | 0.3% |
| 351 | 3 | 0.3% |
| 417 | 3 | 0.3% |
| 58 | 3 | 0.3% |
| 499 | 3 | 0.3% |
| 99 | 3 | 0.3% |
| Other values (847) | 959 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 2 | 1 | |
| 3 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 6 | 1 | |
| 7 | 2 | |
| 8 | 1 | |
| 9 | 1 | |
| 10 | 1 |
| Value | Count | Frequency (%) |
| 857 | 1 | 0.1% |
| 856 | 3 | |
| 855 | 2 | |
| 854 | 1 | 0.1% |
| 853 | 1 | 0.1% |
| 852 | 1 | 0.1% |
| 851 | 1 | 0.1% |
| 850 | 1 | 0.1% |
| 849 | 1 | 0.1% |
| 848 | 1 | 0.1% |
ID_test
Text
Unique 
| Distinct | 992 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 52.4 KiB |
Length
| Max length | 7 |
|---|---|
| Median length | 5 |
| Mean length | 4.9747984 |
| Min length | 3 |
Unique
| Unique | 992 ? |
|---|---|
| Unique (%) | 100.0% |
Sample
| 1st row | 543_1 |
|---|---|
| 2nd row | 11_1 |
| 3rd row | 829_1 |
| 4th row | 284_1 |
| 5th row | 341_1 |
| Value | Count | Frequency (%) |
| 543_1 | 1 | 0.1% |
| 344_1 | 1 | 0.1% |
| 336_5 | 1 | 0.1% |
| 829_1 | 1 | 0.1% |
| 284_1 | 1 | 0.1% |
| 341_1 | 1 | 0.1% |
| 341_2 | 1 | 0.1% |
| 343_1 | 1 | 0.1% |
| 330_1 | 1 | 0.1% |
| 338_1 | 1 | 0.1% |
| Other values (982) | 982 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 1228 | |
| _ | 992 | |
| 5 | 368 | 7.5% |
| 3 | 360 | 7.3% |
| 4 | 348 | 7.1% |
| 2 | 337 | 6.8% |
| 6 | 320 | 6.5% |
| 7 | 306 | 6.2% |
| 8 | 266 | 5.4% |
| 9 | 209 | 4.2% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 4935 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 1 | 1228 | |
| _ | 992 | |
| 5 | 368 | 7.5% |
| 3 | 360 | 7.3% |
| 4 | 348 | 7.1% |
| 2 | 337 | 6.8% |
| 6 | 320 | 6.5% |
| 7 | 306 | 6.2% |
| 8 | 266 | 5.4% |
| 9 | 209 | 4.2% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 4935 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 1 | 1228 | |
| _ | 992 | |
| 5 | 368 | 7.5% |
| 3 | 360 | 7.3% |
| 4 | 348 | 7.1% |
| 2 | 337 | 6.8% |
| 6 | 320 | 6.5% |
| 7 | 306 | 6.2% |
| 8 | 266 | 5.4% |
| 9 | 209 | 4.2% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 4935 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 1 | 1228 | |
| _ | 992 | |
| 5 | 368 | 7.5% |
| 3 | 360 | 7.3% |
| 4 | 348 | 7.1% |
| 2 | 337 | 6.8% |
| 6 | 320 | 6.5% |
| 7 | 306 | 6.2% |
| 8 | 266 | 5.4% |
| 9 | 209 | 4.2% |
Interactions
Correlations
| Age | Height | Humidity | ID | Sex | Temperature | Weight | |
|---|---|---|---|---|---|---|---|
| Age | 1.000 | -0.003 | -0.042 | 0.073 | 0.125 | -0.133 | 0.196 |
| Height | -0.003 | 1.000 | -0.015 | -0.010 | 0.529 | 0.031 | 0.706 |
| Humidity | -0.042 | -0.015 | 1.000 | -0.197 | 0.039 | -0.110 | 0.040 |
| ID | 0.073 | -0.010 | -0.197 | 1.000 | 0.369 | -0.336 | -0.049 |
| Sex | 0.125 | 0.529 | 0.039 | 0.369 | 1.000 | 0.054 | 0.445 |
| Temperature | -0.133 | 0.031 | -0.110 | -0.336 | 0.054 | 1.000 | -0.020 |
| Weight | 0.196 | 0.706 | 0.040 | -0.049 | 0.445 | -0.020 | 1.000 |
Missing values
A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
Sample
| Age | Weight | Height | Humidity | Temperature | Sex | ID | ID_test | |
|---|---|---|---|---|---|---|---|---|
| 0 | 10.8 | 48.8 | 163.0 | 39.0 | 20.7 | 1 | 543 | 543_1 |
| 1 | 11.8 | 41.0 | 150.0 | 41.0 | 22.3 | 1 | 11 | 11_1 |
| 2 | 12.2 | 46.0 | 160.0 | 37.0 | 21.5 | 0 | 829 | 829_1 |
| 3 | 13.2 | 71.0 | 190.0 | 49.0 | 23.8 | 1 | 284 | 284_1 |
| 4 | 13.7 | 53.8 | 169.7 | 40.0 | 25.3 | 0 | 341 | 341_1 |
| 5 | 13.8 | 53.4 | 171.0 | 42.0 | 24.4 | 0 | 341 | 341_2 |
| 6 | 14.0 | 46.0 | 160.0 | 40.0 | 25.3 | 0 | 343 | 343_1 |
| 7 | 14.1 | 50.0 | 168.9 | 42.0 | 24.2 | 0 | 330 | 330_1 |
| 8 | 14.1 | 47.2 | 160.2 | 40.0 | 25.2 | 0 | 338 | 338_1 |
| 9 | 14.1 | 49.7 | 160.1 | 40.0 | 25.8 | 0 | 339 | 339_1 |
| Age | Weight | Height | Humidity | Temperature | Sex | ID | ID_test | |
|---|---|---|---|---|---|---|---|---|
| 982 | 55.4 | 78.0 | 175.6 | 51.0 | 23.4 | 0 | 597 | 597_1 |
| 983 | 55.5 | 61.5 | 168.8 | 44.0 | 21.0 | 0 | 598 | 598_1 |
| 984 | 57.6 | 67.0 | 169.0 | 47.0 | 18.4 | 0 | 389 | 389_1 |
| 985 | 58.5 | 64.0 | 157.0 | 35.0 | 21.5 | 1 | 755 | 755_1 |
| 986 | 58.7 | 66.0 | 171.3 | 38.0 | 15.0 | 0 | 856 | 856_1 |
| 987 | 59.1 | 64.7 | 172.0 | 38.0 | 24.4 | 0 | 856 | 856_2 |
| 988 | 59.7 | 65.2 | 172.0 | 51.0 | 16.8 | 0 | 856 | 856_3 |
| 989 | 61.3 | 102.0 | 185.0 | 56.0 | 20.5 | 0 | 390 | 390_1 |
| 990 | 61.6 | 74.0 | 169.0 | 46.0 | 23.9 | 0 | 596 | 596_1 |
| 991 | 63.0 | 83.5 | 171.5 | 48.0 | 22.2 | 0 | 296 | 296_1 |